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<■ Abstract 

■ Consider the random walk on the permutation group obtained when the step 
£NJ . distribution is uniform on a given conjugacy class. It is shown that there 

is a critical time at which two phase transitions occur simultaneously. On 
the one hand, the random walk slows down abruptly (i.e., the acceleration 

■ drops from to — oo at this time as n — > oo). On the other hand, the largest 
(■h | cycle size changes from microscopic to giant. The proof of this last result 

is both considerably simpler and more general than in a previous result of 
Oded Schramm (2005) for random transpositions. It turns out that in the 
case of random A;-cycles, this critical time is proportional to l/[k(k — 1)], 
whereas the mixing time is known to be proportional to 1/k. 
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1 Introduction 



1.1 Basic result 

Let n > 1 and let S n be the group of permutations of {1, . . . , re}. Consider 
the random walk on S n obtained by performing random transpositions in 
continuous time, at rate 1. That is, let n, ... be a sequence of i.i.d. uniformly 
chosen transpositions among the n(n — l)/2 possible transpositions of the 
set V = {1, . . . , n}, and for all t > 0, set 

a t = n ■ . . . ■ r Nt 

where (Nt,t > 0) is an independent Poisson process with rate 1. It is 
well-known that the permutation at is approximately a uniform random 
permutation (in the sense of total variation distance) after time (l/2)ralogra 
(see [IH]). In particular, this means that at this time, most points belong to 
cycles which are of macroscopic size 0(n), while initially, in the permutation 
o"o which is the identity permutation, every cycle is microscopic (being of size 
1). How long does it take for macroscopic cycles to emerge? Oded Schramm, 
in a remarkable paper [21 j . proved that the first giant cycles appear at time 
n/2. More precisely, answering a conjecture of David Aldous stated in [3], he 
was able to prove that if t = cn with c > 1/2, then there exists a (random) 
set W C {l,...,n} satisfying at(W) = W, such that \W\ ~ On where 
< = 0(c) < 1, and furthermore, the cycle lengths of at\w-> rescaled 
by On, converges in the sense of finite-dimensional distributions towards 
a Poisson-Dirichlet random variable. (The Poisson-Dirichlet distribution 
describes the limiting cycle distribution of a uniform random permutation 
and will be described in more details below). In particular, this implies 
that at contains giant cycles with high probability. On the other hand it is 
furthermore easy to see that no macroscopic cycle can occur if c < 1/2. His 
proof is separated into two main steps. The first step consists in showing 
that giant cycles do emerge prior to time cn when c > 1/2. The second 
step is a beautiful coupling argument which shows that once giant cycles 
exist they must quickly come close to equilibrium, thereby proving Aldous' 
conjecture. Of these two steps, the first is arguably the most technically 
involved. Our main purpose in this paper is to give an elementary and 
transparent new proof of this fact. Let A(t) denote the size of the largest 
cycle of at- For 5 > 0, define 

t 5 = mi{t > : A(t) > 5n}. (1) 

Theorem 1. For any c > 1/2 then t$ < cn with high probability, where 
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This proof is completely elementary and in particular requires almost no 
estimate. As a consequence, it is fairly robust and it can be hoped that it 
extends to further models. We illustrate this by applying it to more general 
random walks on S n , whose step distribution is uniform on a given conjugacy 
class of the permutation group (definitions will be recalled below). We show 
that the emergence of giant cycles coincides with a phase transition in the 
speed of the random walk, as measured by the derivative of the distance 
(with respect to the graph metric) between the position of the random walk 
at time t, and its starting point. This phase transition in the speed is the 
analogue of the phase transition described in [3] for random transpositions. 

We mention that Theorem [1] is the mean-field analogue of a question 
arising in statistical mechanics in the study of Bose condensation and the 
quantum ferromagnetic Heisenberg model (see Toth [22 J. Very few rigor- 
ous results are known about this model on graphs with non-trivial geome- 
try, with the exception of the work of Angel [1] for the case of a d-regular 
tree with d sufficiently large. We believe that the proof of Theorem Q] pro- 
posed here opens up the challenging possibility to prove analogous results on 
graphs that are "sufficiently high-dimensional" such as a high-dimensional 
hypercube, for which the percolation picture has recently started to emerge: 
see, e.g., Borgs et al. [8]. 

1.2 Random walks based on conjugacy classes. 

Fix a number k > 2, and call an element 7 G S n a fc-cycle, or a cyclic per- 
mutation of length k, if there exist pairwise distinct elements x\, . . . ,X)~ G 
{1, . . . , n} such that "f(x) = Xi + \ if x = Xi (where 1 < i < k and Xfc+i := ^1) 
and 7(x) = x otherwise. Thus for k = 2, a 2-cycle is simply a transposition. 
If a is a permutation then a can be decomposed into a product of cyclic 
permutations a = 71 • . . . ■ j r where • stands for the composition of permu- 
tations. (This decomposition being unique up to the order of the terms). A 
conjugacy class r C S n is any set that is invariant by conjugacy a 1— > n an, 
for all 7r G S n . It easily seen that a conjugacy class of S n is exactly a set of 
permutations having a given cycle structure, say (^2, . . . , kj), i.e., consisting 
of &2 cycles of size 2, . . ., kj cycles of size J in their cycle decomposition 
(and a number of fixed points which does not need to be explicitly stated). 
Note that if T is a fixed conjugacy class of S n , and m > n, T can also be 
considered a conjugacy class of S m by simply adding m — n fixed points to 
any permutation a G T. 

Let r be a fixed conjugacy class, and consider the random walk in con- 
tinuous time on S n where the step distribution is uniform on T. That is, let 
(liii > 1) be an i.i.d. sequence of elements uniformly distributed on T, and 
let (Nt,t > 0) be an independent rate 1 Poisson process. Define a random 
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process: 



a t := 71 • ... • 7Ar t , t > 0, 



(2) 



where • stands for the composition of two permutations. Thus the case 
where V consists only of transpositions (i.e. k<i = 1 and kj = if j > 2) 
corresponds to the familiar random process on S n obtained by performing 
random transpositions in continuous time, and the case where V contains 
only one nontrivial cycle of size k > 2 will be referred to as the random 
&;-cycles random walk. The process (o~t,t > 0) may conveniently be viewed 
as a random walk on G n , the Cayley graph of S n generated by V. Note that 
if |r| = ^2j = 2jkj is even, the graph G n is connected but it is not when |T| is 
odd: indeed, in that case, the product of random p-cycles must be an even 
permutation, and thus <tj is then a random walk on the alternate group A n 
of even permutations. This fact will be of no relevance in what follows. 

In this paper we study the pre-equilibrium behaviour of such a random 
walk. Our main result in this paper for this process is that there is a phase 
transition which occurs at time t c n, where 

tc=^Ej(j-l)k)j . (3) 

This transition concerns two distinct features of the walk. On the one hand, 
giant cycles emerge at time t c n precisely, as in Theorem [TJ On the other 
hand, the speed of the walk changes dramatically at this time, dropping 
below 1 in a non-differentiable way. We start with the emergence of giant 
cycles, which is analogue to Theorem [TJ Recall the definition of t$ in (JJ). 

Theorem 2. Let t < t c . Then there exists (3 > such that no cycle is 
greater than (3 log n with high probability. On the other hand for any t > t c 
there exists 5 > such that ts < tn with high probability. 

We now state our result for the speed. Denote by d(x, y) the graph 
distance between two vertices x,y £ S n , and for t > 0, let 

d(t) = d(o,a t ). 

where o is the identity permutation of S n . Recall that a sequence of random 
functions X n (t) converge uniformly on compact sets of S C M. in probability 
(u.c.p. for short) towards a random function X(t) if P(sup fg 5 t<r \X n (t) — 
X(t)\ > e) ->■ as n ->■ 00 for all e > and T > 0. 

Theorem 3. Fix a constant integer J > 2 and constant nonnegative integers 
&2, . . . , kj, and consider the conjugacy class T of S n defined by (&2, . . . , kj). 
Let t c be as in ([3]), and fix t > 0. Then there exists a compact interval 
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/ C (t c ,oo), and a nonrandom function ip{t) satisfying ip(t) = t for t < t c 
and (p(t) < t for t > t c , such that 

-d(tn) — ><p(t), teW\I (4) 
n 

uniformly on compact sets in probability as n — > oo. Furthermore ip is C°° 
everywhere except at t = t c , where the acceleration satisfies u"(t^) = — oo. 
In the case of random k-cycles (k > 2), I = so the convergence holds 
uniformly on compact sets in K. 

Remark 4. We believe that / = in all cases, but our proof only guarantees 
this in the case of random fc-cycles and a few other cases which we have 
not tried to describe precisely. Roughly speaking there is a combinatorial 
problem which arises when we try to estimate the distance to the identity 
in the case of conjugacy classes which contain several non-trivial cycles of 
distinct sizes (particularly when these are coprime). This is explained in 
more details in the course of the proof. Right now, the current result is 
enough to prove that there is a phase transition for d(tn) when t = t c , but 
does not prevent other phase transitions after that time. 

In the case of random A;-cycles, we have t c = l/[k(k — l)] and the function 
ip has the following explicit expression: 

yit) := i _ jr ~ 1)S + (ktye-KW-W (5) 

It is a remarkable fact that for t < t c a cancellation takes place and (p(t) = t. 
The case k = 2 of random transpositions matches Theorem 4 from [3]. 
In the general conjugacy class case, (p may be described as the solution 
to a certain differential equation. For t > and z S [0,1], let Gt(z) = 
exp(— |r|t + t ^2j = 2jkjZ^~^), and let p = p{t) be the smallest solution of the 
equation (in z): Gt(z) = z. Then tp is defined by 

tp(t) = [ 1 - d(s) 2 ds. (6) 
Jo 

It is a fact that 9{t) > if and only if t > t c , which explains why ip{t) = t 
for t < t c and (p(t) < t for t > t c . 

1.3 Heuristics 

The fc-cycle random walk is a simple generalization of the random transpo- 
sitions random walk on <S n , for which the phase transition in Theorem [3] was 
proved in Observe that any fe-cycle (xi, . . . ,Xk) may always be written 
as the product of k — 1 transpositions: 

(xi,...,x k ) = (x 1 ,x 2 ) . . . (x k -i,x k ) 
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This suggests that, qualitatively speaking, the /c-cycle random walk should 
behave as "random transpositions speed up by a factor of (k — 1)", and thus 
one might expect that phase transitions occur at a time that is inversely 
proportional to k. This is for instance what happens with the mixing time 

*mix = \n log n (7) 
k 

for the total variation distance. (This was recently proved in [5] and was 
already known for k < 6, the particular case k = 2 being the celebrated 
Diaconis-Shahshahani theorem |10j); see [16] and [9] for an excellent intro- 
duction to the general theory of mixing times, and [20j in particular for mix- 
ing times of random walks on groups). It may therefore come as a surprise 
that t c = l/[k(k — 1)] rather than t c = 1/k. As it emerges from the proof, the 
reason for this fact is as follows. We introduce a coupling of (at, t > 0) with 
a random hypergraph process (H t ,t > 0) on V = {1, . . . ,n}, which is the 
analogue of the coupling between random transpositions and Erdos-Renyi 
random graphs introduced in [3]. As we will see in more details, hyper- 
graphs are graphs where edges (or rather hyperedges) may connect several 
vertices at the same time. In this coupling, every time a cycle (xi, . ■ ■ , x^) is 
performed in the random walk, Ht gains a hyperedge connecting x±, . . . , x^. 
This is essentially the same as adding the complete graph on {x±, . . . , xt} 
in the graph Ht- Thus the degree of a typical vertex grows at a speed which 
is k(k— 1)/2 faster than in the standard Erdos-Renyi random graph. This re- 
sults in a giant component occurring k(k — l)/2 faster as well. This explains 
the formula t" 1 = k(k — 1), and an easy generalisation leads to ([3]). 

Organisation of the paper: The rest of the paper is organised as 
follows. We first give the proof of Theorem [TJ In the following section we 
introduce the coupling between (at,t > 0) and the random hypergraph pro- 
cess (Ht,t > 0). In cases where the conjugacy class is particularly simple 
(e.g. random fc-cycles), a combinatorial treatment analogous to the classi- 
cal analysis of the Erdos-Renyi random graph is possible, leading to exact 
formulae. In cases where the conjugacy class is arbitrary, our method is 
more probabilistic in nature and the formulae take a different form (Ht is 
then closer to the Molly and Reed model of random graphs with prescribed 
degree distribution, [T7] and [IS])- The proof is thus slightly different in 
these two cases (respectively dealt with in Section [3] and H|) , even though 
conceptually there are no major differences between the two cases. 

2 Emergence of giant cycles in random transposi- 
tions 

In this section we give a full proof of Theorem [TJ As the reader will observe, 
the proof is really elementary and is based on well-known (and easy) results 
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on random graphs. Consider the random graph process (Gt,t > 0) on V = 
{1, . . . , n} obtained by putting an edge between i and j if the transposition 
(i,j) has occurred prior to time t. Then every edge is independent and has 
probability pt = 1 — e - '^ 2 ) , so Gt is a realisation of the Erdos-Renyi random 
graph G(n,pt). 

For t > and i E V, let Cj denote the cycle that contains i. Recall that 
if Cj = Cj then a transposition (i, j) yields a fragmentation of C, = Cj into 
two cycles, while if Cj 7^ Cj then the transposition (i, j) yields a coagulation 
of Ci and C,-. It follows from this observation that every cycle of at is a 
subset of one of the connected components of Gt- Thus let N(t) be the 
number of cycles of at and let N(t) denote the number of components of Gt- 
Then we obtain 

N(t) > N(t), t > 0. (8) 

Now it is a classical and easy fact that the number N(t) has a phase transi- 
tion at time n/2 (corresponding to the emergence of a giant component at 
this time). More precisely, let 6(c) be the asymptotic fraction of vertices in 
the giant component at time cn, so 6(c) is the survival probability of a Pois- 
son Galton- Watson process with mean offspring 2c (in particular 6(c) = if 
c < 1/2). 

Let c > 1/2 and fix an interval of time [ii , ^2] such that ti = cn and 
t\ = ti — rfil 4 . Our goal will be to prove that a cycle of size 5n occurs 
during the interval I = [ii,i2]> where 5 = 6(c) 2 /8. 

Lemma 5. As n — > 00, 

iV-(ti)-JV(t2)~(t2-tl)[l-e 2 (c)] 

in the sense that the ratio of these two quantities tends to 1 in probability. 

Proof. This lemma follows easily from the following observation. The total 
number of edges that are added during / is a Poisson random variable with 
mean t% — t\. Now, each time an an edge is added to Gt, this changes the 
number of components by -1 if and only if the two endpoints are in distinct 
components (otherwise the change is 0). Since the second largest component 
has size smaller than /31ogn with high probability, except on an event of 
probability tending to 0, throughout [ti,t2] this occurs if and only if both 
endpoints are not in the giant component, which has probability uniformly 
close to 1 — 6 2 (c). The law of large numbers concludes the proof. □ 

Lemma 6. 

E (sup \N(t) - N(t)\) < 3cra 1/2 . 

\t<cn J 

Proof. We already know that N(t) > N(t) for all t > 0. It thus suffices 
to control that the excess number of cycles is never more than 4n x / 2 in 
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expectation. Note first that there can never be more than n 1 / 2 cycles of size 
greater than n 1 / 2 . Thus it suffices to count the number Nf x (t) of excess 
cycles of size < n 1 / 2 : 

\N(t)-N(t)\ <N^(t) + n^ 2 . 

These excess cycles of size < n 1 / 2 at time t must have been generated by a 
fragmentation at some time s < t where one of the two pieces was smaller 
than re 1 / 2 . But at each step, the probability of making such a fragmentation 
is smaller than 2re _1//2 . Indeed, given the position of the first marker i, there 
are at most 2n 1 / 2 possible choices for j which result in a fragmentation of size 
smaller than n 1 / 2 . To see this, note that if a transposition is applied 
to a permutation a, and C% = Cj, so a k {i) = j, then the two pieces are 
precisely given by (cr°(i), ■ ■ ■ ,o k ~ l {i)) and (cr (j), ■ ■ ■ , c' C ''~ A: ~ 1 (j))- Thus to 
obtain a piece of size k there are at most two possible choices, which are 
a k (i) and cr~ k (i). Thus E(i^(cn)) < cn ■ 2re -1 / 2 , where F±(cn) is the total 
number of fragmentation events where one of the pieces is smaller than re 1//2 
by time ere. Since 

supJVf (t) <F x (cn) 

t<cn 

this finishes the proof. □ 

Proof of Theorem^ Appying Markov's inequality in Lemma El we see that 
since n 1 / 2 <C n 3 ^ = ti — t±, we also have 

A^(ti)-iV(t 2 )~(t2-ti)(l-0 2 (c)) 

in probability, by Lemma On the other hand, N(t) changes by -1 in the 
case of a coalescence and by +1 in the case of a fragmentations. Hence 
N(t{) - N(t 2 ) = Poisson(t 2 - ti) - 2F(I), where F(I) is the total number 
of fragmentations during the interval /. We therefore obtain by the law of 
large numbers for Poisson random variables: 

F{I)~±(t 2 - tx )9{cY. 

But observe that to if F(I) is large, it cannot be the case that all cycles are 
small - otherwise we would very rarely pick i and j in the same cycle. Hence 
consider the event E = {t$ < cn}. On E^, the maximal cycle size throughout 
/ is no more than 8n. Hence at each transposition, the probability of making 
a fragmentation is no more than 5. By the law of large numbers, on the 
event E c , it must be that F(I) < 25(t 2 - h). Since 25 = 6 l (c) 2 /4, it follows 
immediately that P(£ C ) ->■ as n — > oo. This completes the proof. □ 

Remark 7. This proof is partly inspired by the calculations in Lemma 8 of 
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3 Random hypergraphs and Theorem SI 



We now start the proof of Theorem [3l We first review some relevant defini- 
tions and results from random hypergraphs. 

A hypergraph is a graph where edges can connect several vertices at the 
same time. Formally: 

Definition 1. A hypergraph H = (V,E) is given by a set V of vertices and 
a subset E ofV(V), where V(V) denotes the set of all subsets of V . The 
elements of E are called hyperedges. A d-regular hypergraph is a hypergraph 
where all edges connect d vertices, i.e. for all e £ E, |e| = d. 

For a given d > 2 and < p < 1, we call Gd(n,p) the probability distri- 
bution on d-regular hypergraphs on V = {1, . . . , n} where each hyperedge 
on d vertices is present independently of the other hyperedges with proba- 
bility p. Observe that when d = 2 this is just the usual Erdos-Renyi random 
graph case, since a hyperedge connecting two vertices is nothing else than a 
usual edge. For basic facts on Erdos-Renyi random graphs, see e.g. [7]. 

The notion of a hypertree needs to be carefully formulated in what fol- 
lows. We start with the d-regular case. The excess ex(H) of a given d-regular 
hypergraph H is defined to be 

ex(H) = (d - l)h - r (9) 

where r = \H\ and h is the number of edges in H. 
Observe that if H is connected then ex(H) > — 1. 

Definition 2. We call a connected d-regular hypergraph H a hypertree if 
ex(H) = -1. 

Likewise if ex{H) = and H is connected we will say that H is unicyclic 
and if the excess is positive we will say that the component is complex. 

Remark 8. This is the definition used by Karonski and Luczak in U5f . 
but differs from the definition in their older paper fT3\j where a hypertree 
is a connected hypergraph such that removing any hyperedge would make it 
disconnected. 

In the case where H is not necessarily regular, the excess of a connected 
hypergraph H made up of the hyperedges h\, . . . , h n is defined to be ex(H) = 
Ylt=i(\hi\ ~ 1) — \H\, where \hi\ denotes the size of the hyperedge hi and \H 
is the cardinality of the vertex set of H. Then ex{H) > 1 and H is said to 
be a hypertree if ex(H) = —1. 
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3.1 Critical point for random hypergraphs 



We start by recalling a theorem by Karoriski and Luczak [15] concerning the 
emergence of a giant connected component in a random hypergraph process 
(Ht,t > 0) where random hyperedges of degree d > 2 are added at rate 1. 

Theorem 9. Let c > and let t = cn. 

- When c < Cd = l/[d(d— 1)] then a.a.s then Ht contains only trees and 
unicyclic components. The largest component has size O(logn) with 
high probability. 

- When c > Cd then there is a.a.s a unique complex component, of size 
On asymptotically, where 6 = 0d(c) > 0. All other component are not 
larger than O(logn) with high probability. 

Note that if c < Cd the number of unicyclic components is no more 
than C'logn for some C > which depends on c. Indeed, at each step 
the probability of creating a cycle is bounded above by C log n/n since the 
largest component is no more than 0(log n) prior to time cn. Since there are 
0{n) steps this proves the claim. We will need a result about the evolution 
of the number of components N(t) in (Ht,t > 0). 

Proposition 10. Let t > 0. Then as n —> oo, 

ijv(t„) _> p jr ((rf-i)^+i) fe - 2 W h e ^(M^i)+i) 

h=0 

Proof. Note first that, by monotonicity of the number of clusters and con- 
tinuity of the function in the right-hand side, it suffices to establish this 
result when t ^ l/[d(d — 1)]. Moreover, by Theorem and since there are 
no more than C log n unicyclic components it is enough to count the number 
of hypertrees N(s) smaller than Clogn in H s where s = tn. We will first 
compute the expected value and then prove a law of large numbers using a 
second moment method. 

Let h > 0, we first compute the number of hypertrees with h hyperedges 
(h = corresponds to isolated vertices). These have r = id— l)h+l vertices. 
By Lemma 1 in Karohski-Luczak [TJ], there are 

( r _l)! r A-i 

(10) 



h\[(d-iy.\ h 



trees on r = [d — l)h + 1 labeled vertices (this is the analogue to Cayley's 
(1889) well-known formula that there are k k ~ 2 ways to draw a tree on k 
labeled vertices). If T is a given hypertree with h edges labelled by elements 
of V = {1, . . . , n}, there are a certain number of conditions that must be 



10 



fulfilled in order for T to be one of the components of G: (i) The h hyperedges 
of T must be open, (ii) Q) — s hyperedges must be closed inside the rest of 
T, (iii) T must be disconnected from the rest of the graph, which requires 
closing rQ~r) hyperedges. 

Now, remark that at time s = tn, because the individual Poisson clocks 
are independent, each hyperedge is present independently of the others with 
probability p = 1 — exp (— s/OD) ~ d\t/n d ~ l . It follows that the probability 
that T is one of the components of Ht is 

p h (l - p)(dh h H n d- r i) . (11) 

Hence the expected number of trees in H s with h edges is 

»)lK")«^ pft(1 - ! ' )(:) " k+rG::) (12) 



r h-2 

~n—(dt) h e- drt 

Write C for the set of connected components of Ht- Note that if T\ and T2 are 
two given hypertrees on V with distinct vertex sets and with h hyperedges 
each, then 

P(T G C) 2 



P(Ti e C and T 2 6 C) 



(i- P y 



From this we deduce that cov(l{ rigC }, l{T 2 gC}) ~~ * an d that var(A^/ l (s)) = 
o(n 2 ). Thus, by Chebyshev's inequality: 

^N h (s) ^ P E " 1)h " 2 (dt) , (13) 

in probability as n — > 00. The end of the proof of the proposition now 
follows from (|13p and the following bound: 

limsuplimsup-E(iV > /j (s)) = 0. (14) 

ho-toc n— >oo n 

where N > f l0 (s) = Yl^=h +i ^h( s )- Indeed, for every e > and 77 > 0, we 
can choose ho large enough such that the finite sum in the right-hand side 
of (|13p lies within e of the infinite series. We then choose n large enough so 
that E(A r > / lQ [s))/n < erj, whence by Markov's inequality: 

F(-N >ho (s) > en) < r] 

We now conclude using (fT3j) . To obtain the bound (fT4"|) we use (fl~2|) . from 
which it follows (using < n r /r! and 1 — e~ x < x), 
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But since r = (d — l)h + 1 < Clogn, we see that 

(dt) h r h ~ 2 

E(JV&(a)) < exp(-rdt + o(l)) 

where the term o(l) is uniform in r < Clogn. Using Stirling's formula 
we obtain a uniform exponential bound for E(A?/ l (s)/n) provided that t 7^ 
l/[d(d - 1)]. CHD now follows. □ 

3.2 Bounds for the Cayley distance on the symmetric group 

In the case of random transpositions we had the convenient formula that if 
o~ G 5 n then d(o, a) = n — ^cycles, a formula originally due to Cayley. In the 
case of random ^-cycles with k > 3, unfortunately there is to our knowledge 
no exact formula to work with. However this formula stays approximately 
true, as shown by the following proposition. 

Proposition 11. Let k > 3 and let a G S n . (If k is odd, assume further 
that a G A n ). Then 

j^jin - H) < d(o, < r )<J =1 (n-\a\) + C(k)\R k (a)\ 

where \a\ is the number of cycles of a, C(k) is a universal constant depending 
only on k, and Rk(o~) is the set of cycles of a whose length I 7^ 1 mod k — 1. 

Proof. For simplicity we consider only the case k = 3. Thus let a G A n . For 
each cycle of odd length . . . , «2r+i) we can write 

(h, ■ ■ ■ ,«2r+l) = (il,i2,i3)(*3,«4)*5) • • • («2r-l, «2n *2r+l) 

which has exactly r 3-cycles factors. Now, because a G »4 n , the number of 
cycles of even length must be even. So let (£1, . . . , i2r)C?i> • • • , 32m) be a pair 
of even cycles. Then we start by building 

{h,h){ji, 32) = (ii,*2, ji)(*2, 31,32) 

in two moves and then completing each of the cycle in the same way as 
above. The total number of moves to build this pair of cycles is thus 2 + 
(r — 1) + (m — 1) = r + m. It follows that a can be made up of at most 

E ^(l c l" 1 )+ E \\c\ = \^-W\) + \\Mcr)\- 

c(£R 2 {cr) ceR 2 (cr) 

This gives the upper-bound. On the other hand, multiplying a by a 3- 
cycle can create at most two new cycles. Hence, after p multiplications the 
resulting permutation cannot have more than \a\ +2p cycles. Therefore the 
distance must be at least that ko for which |er| + 2ko > n, since the identity 
permutation has exactly n cycles. The lower-bound follows. □ 
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3.3 Phase transition for the 3-cycle random walk 

We now finish the proof of Theorem [3] in the case of random fc-cycles. 

Proof of Theorem^ if kj = 5kj- The proof follows the lines of Lemma 
Let N(t) be the number of cycles of a and let N(t) be the number of com- 
ponents in Ht, where (Ht, t > 0) is the random /c-regular hypergraph process 
obtained by adding the edge {x±, . . . , Xk} whenever the fe-cycle (x±, . . . , Xk) is 
performed. Then note again that every cycle of at is a subset of a connected 
component of Ht, so N(t) > N(t). (Indeed, this property is a determinis- 
tic statement for transpositions, and a sequence of random fc-cycles can be 
decomposed as a sequence (k — 1) times as long of transpositions.). 
Repeating the argument in Lemma El we see that 

n" 3/4 (sup \N(t) - N(t)\) -> 0, (15) 

\t<cn J 

in probability. This is proved in greater generality (i.e., for arbitrary conju- 
gacy classes) in Lemma HH Moreover, for any c G Rk(o~t) must have been 
generated by fragmentation at some point (otherwise the length of cycles 
only increases by k — 1 each time). Thus Rk{o~t) < N(t) — N(t), and Theo- 
rem [3] now follows. 

□ 

4 Proofs for general conjugacy classes 
4.1 Random graph estimates 

Let T = (fe, . . . , kj) be our fixed conjugacy class. A first step in the proof of 
Theorems [3] and [2] in this general case is again to associate a certain random 
graph model to the random walk. As usual, we put a hyperedge connecting 
xi, . . . , Xfe every time a cycle (x\ . . . x^) is performed as part of a step of the 
random walk. Let H s be the random graph on n vertices that is obtained at 
time s. A first step will to prove properties of this random graph H s when 
s = tn for some constant t > 0. Recall our definition of t c : 

J 

and that 1 — 9 be the smallest solution of the equation (in z): Gt(z) = z, 
where 

J J 

G t {z) = eM-t^jkj+t^jkjZ^ 1 ). (17) 

5=2 J=l 
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Lemma 12. If t < t c then there exists ft > such that all clusters of Hf n 
are smaller than ft log n with high probability. If t > t c , then there exists 
ft > such that all but one clusters are smaller than ftlogn and the largest 
cluster L n (t) satisfies 



n 

in probability. 



9{t) 



Proof. We first consider a particular vertex, say v G V, and ask what is 
its degree distribution in H tn . Write at = 71 . . . jN t where ("fi,i > 1) is a 
sequence of i.i.d. permutations uniformly distributed on V, and (Nt,t > 0) 
is an independent Poisson process. Note that for t > 0, < Nf : v € 

Supp(7i)} is a Poisson random variable with mean t^2, J - =2 jkj/n. Thus by 
time tn, the number of times v has been touched by one of the 7^ is a Poisson 
random variable with mean t^2, J - =2 jkj. For each such 74, the probability 

that v was involved in a cycle of size exactly t is precisely £ke/ ^2j =2 jkj. 
Thus, the number of hyperedges of size j that contain v in Hm is -P/, where 
(Pj,j = 2, . . . , J) are independent Poisson random variables with parameter 
tjkj. Since each hyperedge of size j corresponds to j — 1 vertices, we see 
that the degree of v in in H tn , D v , has a distribution given by 



D v = J2ti-l)Pr (18) 

j=2 



Now, note that by definition of i c (see (|16p ) 

E(D V ) > 1 t > t 



The proof of Theorem 3.2.2 in Durrett [TT] may be adapted almost verba- 
tim to show that there is a giant component if and only if E(Z)„) > 1, and 
that the fraction of vertices in the giant component is the survival probabil- 
ity of the associated branching process. Note that the generating function 
associated with the progeny (fT8|) is 



J i 
G t (z) := E(z D ) = IJE^'" 1 ^') = JJexp^.^"- 1 - 1)) 

i=2 




cxp 



thus p(t) = 1 — 9(t) is the smallest root of the equation Gt(z) = z. From the 
same result one also gets that the second largest cluster is of size no more 
than ftlogn with high probability, for some ft > 0. □ 
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Let N(s) be the number of clusters at time s in H s , and let u n (t) = 
-M(N(tn)). Define a function u(t) by putting: 

u(t) = 1 - K I 1 - d{s) 2 ds, (19) 
Jo 

where K := X^/=2 kjti ~~ -*-)> an< ^ n °te that that u(0) = 1, for t < t c we have 
= 1- Kt, and > 1 - ifi for t > t c . 

Lemma 13. As n — > oo, we have 

u n (t) — > u(t), 
uniformly on compacts in probability. 

Proof. Let H denote a hypergraph on {1, . . . , n}, and let h = h\ U . . . U hi 
be a set of hyperedges. Denote by H' = H + h the graph obtained from 
H by adding the hyperedges hi,...,fi£ to H. Let (x±, . . . , x n ) be a dis- 
crete partition of unity, i.e., a non-increasing sequence of numbers such 
that Y17=i x, i = 1 an d is a nonnegative integer. Define a function 
f(xi, . . . ,x n ) as follows. Let if be any hypergraph for which X'l £1X6 the 
normalized cluster sizes. Let h = hi U . . . U hi be a collection of hyperedges 
of sizes 2, 3, . . . , J (with size j being of multiplicity kj), where the hyperedges 
hi are sampled uniformly at random without replacement from {1, . . . ,n}. 
Let H' = H + h. Then we define / by putting 

f( Xl ,...,x n ) :=E(\H'\ - \H\) 

where \H\ denotes the number of clusters of H. Then we have that 

1 I* 

M t :=-\H(tn)\- / f(x 1 (sn),...,x n (8n))d8 (20) 
n Jo 

is a martingale, if (xi(s), . . . ,x n (sn)) denote the ordered normalized cluster 
sizes of H(s). (Note that M$ = 1.) Thus, taking expectations, 

u n (t) = l+ / E[f(x 1 (sn),...,x n (sn))]ds 
Jo 

We claim that, as n — > oo, for every s fixed, 

E(f(xx(sri), x n (sn)) -> -K(l - 6(s) 2 ). (21) 
where K = E/= 2 %(i ~ x )- To 

see this, note that for every hyperedge 
h = {ii, . . . , ij} of size 2 < j < J which is added to the graph, the increase 
in the number of clusters is the same as if we successively add the edges 
12}, ■ ■ ■ {ij-i) ij}- Let us compute the expected gain when adding the 
edge {ik-i,ik}- Summing over k gives us the expected gain after adding the 
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edge h by linearity of the expectation, and summing over hyperedges will 
give us the value of /. Now, condition on what happens by the time we add 
the edge {ik-i,ik}- If the cluster sizes are (xi, . . .), then, either ik falls into 
the same component as ik-i, in which case the number of components does 
not change, or falls in a different component, in which case, the number 
of clusters decreases by 1. Hence, the expected gain at this stage is 



5>[-(i-*i)] = -i+E 

i>l i>l 



xl 



As n — > oo, by Lemma [T2l this converges to — 1 + 0(s) 2 . (Note in particular 
that this limit is independent from what happened during the earlier edges 
added to H). Since there are (j — 1) edges to add for a hyperedge of size 
j and kj such hyperedges, (f2"Tj) follows. Using the Lebesgue convergence 
theorem, we deduce that, 

u n (t) ->■ 1 - K f (1 - 6 2 {s))ds = u(t). 
Jo 

To obtain convergence in the u.c.p. sense (uniform on compacts in proba- 
bility), we note that 

var(|#'| - \H\) < C (22) 

for some constant C which depends only on {k?, ■ ■ ■ , kg), since \H'\ may 
differ from \H\ only by a bounded amount. Now, by Doob's inequality, if 
M s = n(M s - 1): 

sup \(M S - 1)| > e J = P (sup |M S | 2 > n 2 e 2 

<t ) \s<t 



< 



4var(M, 



n 2 E 2 



AC , , 

< — • 23 

ne z 

The last line inequality is obtained by conditioning on the number of steps 
./V between times and tn, noting that after each step, the variance of Mt 
increases by at most C by (|22p . Hence, to conclude the proof of Lemma [T3l 
it suffices to show that we have the convergence: 

t rt 
f(x 1 (an),...)d8—^-K I (l-6(s) 2 )ds, u.c.p. (24) 

□ 



Jo 



This is a direct consequence of the fact that as n — > oo: 
f(x 1 (sn), ...)—► -K(l - 9(s) 2 ), u.c.p. 
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which itself follows from pointwise convergence in probability, monotonicity 
in s, and the fact that the limiting function is continuous. (Monotonicity 
comes from a simple coupling argument, using the fact that H (t) is a purely 
coalescing process). 



4.2 Random walk estimates 

Lemma 14. Let N(t) be the number of cycles of o~{tn). Then we have, as 
n — > oo: 

-L(N(tn)-N(tn))^0, u.c.p. (25) 

Proof. This is very similar to Lemma [HJ Say that a cycle is large or small, 
depending on whether it is bigger or smaller than yfn. To start with, observe 
that there can never be more than y/n large cycles. As usual, we have that 
N(t) > N(t), and we let N ex (t) = N(t) - N(t) be the excess number of 
cycles. This in turn can be decomposed as N ex (t) = NZ x (t) + Nf x (t), where 
the subscripts f and \. refer to the fact that the cycles are either small or 
large. Thus we have 

iVf (t) < sfc, 

and the problem is to control Nf x (t) . Writing every cycle of size j as a prod- 
uct of j — 1 transpositions, we may thus write at = Yl"=i Ti, for a sequence 
of transpositions having a certain distribution (they are not independent). 
Then N^ x (t) < F^(t), where F±(t) is the number of times 1 < i < m 
that the transpositions r, yields a fragmentation event for which one of the 
fragments is small. However, conditionally on n, . . . ,Tj_i, the conditional 
probability that Ti yields such a fragmentation is still bounded by 4n -1 / 2 . 
since mt = KN t , where K = £/ =2 (j-l)fc, > 1 and N t is a Poisson random 
variable with mean t, it follows that 

E(sup F^s)) < iKt^R 

s<tn 

Thus by Markov's inequality, 

sup Fj.(s) > n 3/4 ] — > 0. (26) 

s<tn / 

Hence, n~ 3/4 \N(tn) - N(tn)\ converges to u.c.p, which concludes the proof 
by Lemma HH1 □ 

Note in particular that by combining Lemma [13] with Lemma HU we get 
that 

-N(tn) ->u(t), u.c.p. (27) 
n 
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Lemma 15. Let t > t c . Then t$ < tn with high probability, where 

S:= — / 9 2 {s)ds > 0, (28) 
t Jo 

where K = Ylj=2(j ~ 

Remark 16. Note that Lemma [T5\ immediately implies Theorem^ 

Proof. The idea is to say that, since we know that the number of cycles is 
approximately the number of clusters in the random graphs, this implies a 
nonlinearity in the behaviour of this number. In turns, this means there are 
many fragmentations and thus that there are some large clusters. 

To formalize this, assume that a permutation a has a cycle structure 
(Ci, . . . , C r ) and that the normalized cycle sizes, i.e., X{ = 

\Ci\/n. Define a function g(x\, . . . , x r ) by putting 

g{x 1 ,. ..,x r ) := E(|ct'| - \a\), 

where a' = cr-7 and 7 is a uniform random element from T, while |c| denotes 
the number of cycles of a. Then if we define a process 



1 



M' t = -N{tn)-l g(x 1 (sn),...)ds, 



n 

then (Mj.,t > 0) is a martingale started from Mg = 1. Moreover, writing 
t = T\ ■ . . . ■ tk, where r, are transpositions, and if we let o~i = a -t\ . . .7$, so 
that do = a and ax = cr' , then 

K 

g(xi, ...,x r ) = y^E(|crj| - |crj_i|). 
i=l 

Recall that the transposition r, can only cause a coalescence or a fragmen- 
tation, in which case the number of cycles decreases or increases by 1. If the 
relative cycle sizes of cij_i are given by (j/i, . . . , y r ), it follows that 

n 



■1 <E(H - |ai_i|) < -l + 2y* 



n — i + 1 



where y* = max(yi, . . . , y r ). Moreover, y* < 2 % y§. 

Prom this we obtain directly that with high probability (uniformly on 
compact sets) 



I g( Xl (sn),...)ds< [ K [—1 + 2 K x*(sn)] ds, 
Jo Jo 



(29) 



where x*(s) = max(xi(s), . . . , x r (s)). On the other hand, using Doob's 
inequality in the same way as (|23p . we also have: 

P(sup|(Mi-l)| >e) <i^. (30) 
\s<t J ne l 
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Combining this information with (j27|) . we obtain, with high probability uni- 
formly on compact sets: 

f [-l + 2 K x*(sn)]ds> [ -l + 6 2 (s)ds. (31) 
Jo Jo 

From this we get, since Yll=i x i{ sn ) 2 — A n (t), with high probability 

t2^supx*(s)> I 6 2 (s)ds, (32) 

s<tn Jo 

i.e., r<5 < tn. □ 

4.3 Distance estimates 

We are now ready to prove that 

d{atn) — > <p{t), 

uniformly on compact sets in probability as n — > oo except possibly on some 
interval compact / in (i c ,oo), where 

<P® = l ~ Z ^ L =f l~ 0(s?ds. (33) 

The proof is analogous but more complicated than that of Proposition [TT1 
Note that if a is a permutation, every transposition can at most increase 
the number of cycles by 1. Hence if a has N(o~) cycles, after one step s £ T, 
a has at most N(a) + K cycles. Thus after p steps, the number of cycles 
of g is at most N(cr) + Kp. Since the identity permutation has exactly n 
cycles, we conclude that 

d(a)>±(n-N(a)). (34) 

Together with Lemma [H and the definition of <p(t), this proves the lower 
bound in Theorem [3l 

Note that this bound would be sharp if we can find a path to the identity 
which makes a fragmentation at each step. We now work our way towards 
the upper-bound, which shows that indeed such a path may be found ex- 
cept that we may have to add an additional o(n) coagulation steps. Call a 
component of Ht good if it is a hypertree and bad otherwise; a hyperedge 
is good if its component is good. Likewise, call a cycle C of a(t) good if its 
associated component C in Ht is a hypertree. Therefore, a good cycle is one 
which has never been involved in fragmentations, i.e., its history consists 
only of coagulation events. Fix t > and write o~(tn) = o~ 9 ■ a b , where a 9 
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is the product of all good cycles of a(tn) while a b is the product of all bad 
cycles. Thus 

a - c 1 ... c r(;?) , a - c x . . . c r(6) 

Note that by (|26p . and recalling that there can never be more than yfn cycles 
greater or equal to ^/n, we have r(6) < n 3 / 4 say, and the total mass of cycles 
in a b is 

i^i = 0(t) + o (l), (35) 
n 

where o(l) stands for a term that converges to in probability, u.c.p. As- 
sume for simplicity that V is an odd conjugacy class that generates all of 
S n (the arguments below can easily be adapted otherwise). To start with, 
note that in less than o(n) moves, we can transform a(tn) into a' where 
all the cycles c\, . . . , c b ^ have been coagulated to form one large bad cycle, 
leaving the good cycles unchanged. Thus a' = a 9 • a' b , where a lb has only 
one nontrivial cycle, whose size is |cr 6 |. By the triangle inequality, it then 
suffices to find a path between a' and the identity of length approximately 
given by (fM|) . 

Roughly speaking, our strategy for constructing a path between a' and 
the identity using steps from the conjugacy class V is to systematically de- 
stroy every good cycles as much as possible before destroying the bad cycles, 
as the good cycles are slightly harder to destroy than the bad cycles. Indeed, 
consider a cycle C such that \C\ > \F\. Then note that applying a judicious 
permutation s 6 T to C we can transform C into C where the elements of 
C\C are now fixed points, and \C'\ = \C\ — K. Therefore, for an arbitrary 
cycle C, we get that 

ICI 

C can be destroyed in at most — — - + 0(1) steps, (36) 

K 

where the term 0(1) is nonrandom, uniformly bounded in C and n. This 
bound is useful for the large bad cycle that makes up a', but does not help 
for small (good) cycles, of which there are of order n. 

However, if C is a good cycle and the hyperedges associated 

with the component of C in G(tn) (corresponding to the application of 
certain cycles as part of a step prior to time tn, say 71, ... ,7^, which we will 
call the subcycles of C), then C can be destroyed by applying successively j 
random cycles 7J, . . . , of respective length \ei\, . . . , \ ej\, in some specified 
order. Unfortunately, it may not always be possible to perform exactly the 
sequence , . . . , 7^ as there are some arithmetic constraints on the sizes of 
the cycles that can be performed (indeed, each application of a cycle must 
be a part of the application of a permutation s G T). A problem may thus 
arise because, among good components the smaller hyperedges tend to be 
over-represented. This is made precise by the next lemma. 
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Lemma 17. Fix t > 0. Let j > 2 suc/z i/iai fcj > 0. Then the number 
Uj(tn) of good hyperedges of size j in Ht n , satisfies 



Proof. The number of j-edges that have been added to G(tn) is a Poisson 
random variable with mean tnkj. For each such edge, the probability that 
it is not in the giant component ^converges to (1 — 0(t)) 3 . [To see this, 
note that by Lemma [T2l it suffices to check that none of the j points are 
in a cluster of size greater than /31ogn for /3 > large enough. This in- 
volves checking a neighbourhood of these j points so that no more than 
j(3logn vertices' connections are revealed. Since this is much smaller than 
the n 1 / 2 neighbourhood size of the birthday problem The probability that 
the exploration Thus K(Uj(tn)) ~ tnkj(l — 6) 3 . while if e and e' are two 
randomly chosen j-edges, P(e C W, e' C W) converges for the same reasons 
to (1 — 9(t)) 23 , so that cov(l{ ecH /|, l{ e ' C vy}) — > 0. Thus the lemma follows 
from the second moment method. □ 

Recall that J is the maximal size of a cycle for a permutation s G T, so 
that the subcycles of size J are the most under-represented among good cy- 
cles. Consider the path that leads from o~j := a' to <tj_i in dj = Uj(tn)/kj 
steps, where oj-\ is the permutation obtained by destroying from a j all 
the subcycles of size J from all good cycles and completing each step by 
removing kj subcycles of size j for 2 < j < J — 1 among good cycles. At 
this point we may write crj_i = • °jLd where o~j_ l = o~' b (so the bad 
part is unchanged) and o~j_ 1 is the same as a 9 but all subcycles of size J 
have been destroyed. 

If r consists only of fc-cycles, then the estimate (j36[) with (|35p finishes the 
proof of the theorem in that case. Else, we still call the cycles of o' 9 -_ l good, 
and note that they may still be decomposed in subcycles of size j < J — 
1. We similarly construct inductively crj_2, . . . , where o~j-\ is obtained 
from Oj by removing from it all good subcycles of size j. Each time a 
step s = ci . . . cl is performed, where L = ^2g =2 kp, we take eg from the 
good subcycles of crj if i < j, while we use for eg vertices from a h -. This 
construction is possible so long as o~j does not "run out of mass" . However, 
by Lemma [T71 for every e > with high probability the total mass that is 
required from bad cycles in this procedure is no more than 




n 



(37) 



M = ^jfcjCl + e)tn[(l - 6{t)f - (1 - 6(0] 



since J is uniformly bounded in n. Thus if 



M < 6(t)n 
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which is the initial mass of bad cycles (i.e., the mass of Oj = a ), then 
the upper-bound (and hence the result) follows from ([36]) and ([35]) . Indeed, 
in that case, we have constructed a path to the identity where the only 
coagulations are made when going from a{tn) to a' and potentially when 
finishing to destroy the bad cycle a\. In any case that accounts for no more 
than o(n) such coagulations (with high probability). Referring to the remark 
under (|34|) . it follows that this path has a length of no more than 

— (n - N(a)) + o(n) = ip(t)n + o(n). 
K 

It thus remains solely to prove that M < On with high probability if t > t c 
is sufficiently close to t c . However, using that 1 — (1 — x) a < ax if a > 1 
and < x < 1, we see that for all t > t c 

M ^ 

r <(l + e)t(l-9fY,k 3 j(j-2) (38) 

Thus it suffices to prove that the right-hand side is strictly smaller than 
1 if £ is sufficiently close to t c or if t is sufficiently large. When t — > t c = 
(S/=2 3 (J ~ then this is easily verified, at least provided that V does 

not consist solely of 2-cycles, in which case the result is already known. In 
the case t — > oo, this comes from the fact that there exists c > such that 
for t large enough 

1 - 0(t) < e~ ct . 

In turn, this follows from the fact that 6(t) is the survival probability of a 
Galton- Watson process where the offspring distribution is (|18p and can thus 
be bounded below stochastically by a Poisson random variable with mean 
t. This finishes the proof of the result. 

Remark 18. Based on numerical methods in several particular cases, we 
expect that the inequality M < On holds in general (i.e., for all t > t c and 
all conjugacy class T of finite length). This would imply in particular that 
the limiting result for the behaviour of the distance d(t) should hold for all 
t > 0. In fact, in all the examples that we have looked at, the function M/9 
appears to be monotone decreasing for t > t c . 

However the upper-bound (|38p is too crude for this, as it can be shown 
that the right-hand side does not have to be monotone and is in fact strictly 
greater than 1 for some values of t > t c provided that T contains cycles of 
size large enough. 
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